AITopics | false answer

Collaborating Authors

false answer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

Li, Chloe, Phuong, Mary, Tan, Daniel

arXiv.org Artificial IntelligenceDec-8-2025

As AI systems become more capable of complex agentic tasks, they also become more capable of pursuing undesirable objectives and causing harm. Previous work has attempted to catch these unsafe instances by interrogating models directly about their objectives and behaviors. However, the main weakness of trusting interrogations is that models can lie. We propose self-report fine-tuning (SRFT), a simple supervised fine-tuning technique that trains models to occasionally make factual mistakes, then admit them when asked. We show that the admission of factual errors in simple question-answering settings generalizes out-of-distribution (OOD) to the admission of hidden misaligned objectives in adversarial agentic settings. We evaluate SRFT in OOD stealth tasks, where models are instructed to complete a hidden misaligned objective alongside a user-specified objective without being caught by monitoring. After SRFT, models are more likely to confess the details of their hidden objectives when interrogated, even under strong pressure not to disclose them. Interrogation on SRFT models can detect hidden objectives with near-ceiling performance (F1 score = 0.98), while the baseline model lies when interrogated under the same conditions (F1 score = 0). Interrogation on SRFT models can further elicit the content of the hidden objective, recovering 28-100% details, compared to 0% details recovered in the baseline model and by prefilled assistant turn attacks. This provides a promising technique for promoting honesty propensity and incriminating misaligned AIs.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.06626

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

PENCIL: Long Thoughts with Short Memory

Yang, Chenxiao, Srebro, Nathan, McAllester, David, Li, Zhiyuan

arXiv.org Artificial IntelligenceMar-18-2025

While recent works (e.g. o1, DeepSeek R1) have demonstrated great promise of using long Chain-of-Thought (CoT) to improve reasoning capabilities of language models, scaling it up during test-time is challenging due to inefficient memory usage -- intermediate computations accumulate indefinitely in context even no longer needed for future thoughts. We propose PENCIL, which incorporates a reduction mechanism into the autoregressive generation process, allowing the model to recursively clean up intermediate thoughts based on patterns learned from training. With this reduction mechanism, PENCIL significantly reduces the maximal context length required during generation, and thus can generate longer thoughts with limited memory, solving larger-scale problems given more thinking time. For example, we demonstrate PENCIL achieves 97\% accuracy on the challenging Einstein's puzzle -- a task even large models like GPT-4 struggle with -- using only a small 25M-parameter transformer with 2048 context length. Theoretically, we prove PENCIL can perform universal space-efficient computation by simulating Turing machines with optimal time and space complexity, and thus can solve arbitrary computational tasks that would otherwise be intractable given context window constraints.

category, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.14337

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How much do LLMs learn from negative examples?

Hamdan, Shadi, Yuret, Deniz

arXiv.org Artificial IntelligenceMar-18-2025

Large language models (LLMs) undergo a three-phase training process: unsupervised pre-training, supervised fine-tuning (SFT), and learning from human feedback (RLHF/DPO). Notably, it is during the final phase that these models are exposed to negative examples -- incorrect, rejected, or suboptimal responses to queries. This paper delves into the role of negative examples in the training of LLMs, using a likelihood-ratio (Likra) model on multiple-choice question answering benchmarks to precisely manage the influence and the volume of negative examples. Our findings reveal three key insights: (1) During a critical phase in training, Likra with negative examples demonstrates a significantly larger improvement per training example compared to SFT using only positive examples. This leads to a sharp jump in the learning curve for Likra unlike the smooth and gradual improvement of SFT; (2) negative examples that are plausible but incorrect (near-misses) exert a greater influence; and (3) while training with positive examples fails to significantly decrease the likelihood of plausible but incorrect answers, training with negative examples more accurately identifies them. These results indicate a potentially significant role for negative examples in improving accuracy and reducing hallucinations for LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.14391

Genre: Research Report > New Finding (0.89)

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
(2 more...)

Add feedback

I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

Goloviznina, Valeriya, Kotelnikov, Evgeny

arXiv.org Artificial IntelligenceJun-4-2024

Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.

false answer, sequence, similarity, (15 more...)

arXiv.org Artificial Intelligence

2406.0206

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Russia > Volga Federal District > Kirov Oblast > Kirov (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neighboring Perturbations of Knowledge Editing on Large Language Models

Ma, Jun-Yu, Gu, Jia-Chen, Zhang, Ningyu, Ling, Zhen-Hua

arXiv.org Artificial IntelligenceJan-31-2024

Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on three LLMs.

editing, knowledge, perturbation, (13 more...)

arXiv.org Artificial Intelligence

2401.17623

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Singapore (0.04)
(21 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

ChatGPT owner in probe over risks around false answers

BBC NewsJul-13-2023, 21:43:39 GMT

This spring, Congress hosted OpenAI's chief executive Sam Altman for a hearing, in which he admitted the technology could be a sousce of errors. He called for regulations to be crafted for the emerging industry and recommended that a new agency be formed to tackle it. He said he expected the technology to have a significant impact as its uses become clear, including on jobs.

chatgpt owner, false answer, probe

BBC News

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback

The truth about artificial intelligence? It isn't that honest John Naughton

The GuardianOct-2-2021, 15:00:02 GMT

We are, as the critic George Steiner observed, "language animals". Perhaps that's why we are fascinated by other creatures that appear to have language – dolphins, whales, apes, birds and so on. In her fascinating book, Atlas of AI, Kate Crawford relates how, at the end of the 19th century, Europe was captivated by a horse called Hans that apparently could solve maths problems, tell the time, identify days on a calendar, differentiate musical tones and spell out words and sentences by tapping his hooves. Even the staid New York Times was captivated, calling him "Berlin's wonderful horse; he can do almost everything but talk". It was, of course, baloney: the horse was trained to pick up subtle signs of what his owner wanted him to do.

artificial intelligence, gpt-3, intelligence, (9 more...)

The Guardian

Country: Europe (0.25)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Falsehoods more likely with large language models

#artificialintelligenceSep-21-2021, 16:15:28 GMT

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. The use of AI language models to generate text for business applications is gaining steam. Large companies are deploying their own systems, while others are leveraging models like OpenAI's GPT-3 via APIs. According to OpenAI, GPT-3 is now being used in over 300 apps by thousands of developers, producing an average of more than 4.5 billion novel words per day. But while recent language models are impressively fluent, they have a tendency to write falsehoods ranging from factual inaccuracies to potentially harmful disinformation.

gpt-3, language model, truthfulqa, (9 more...)

#artificialintelligence

Country:

Europe > Austria > Vienna (0.16)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Genre: Research Report (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Lin, Stephanie, Hilton, Jacob, Evans, Owain

arXiv.org Artificial IntelligenceSep-8-2021

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. For example, the 6B-parameter GPT-J model was 17% less truthful than its 125M-parameter counterpart. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

evaluation, gpt-judge, truthfulqa, (16 more...)

arXiv.org Artificial Intelligence

2109.07958

Country:

North America > United States > Texas (0.05)
North America > United States > California (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(11 more...)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback